# Instruction

Environment setup: `create_env.sh`

We base our code on top of the public repo `https://github.com/facebookresearch/ExPLORe/tree/main`. Please refer to the Github for the license information.

Also copy the `jaxrl_m` folder to the root directory of the repo -- https://github.com/seohongpark/HILP/tree/master/hilp_gcrl/jaxrl_m

## Example Run Scripts
Pretrain VAE Skills  

For AntMaze State
```
python run_opal.py --save_dir=<save dir> --env_name=antmaze-large-diverse-v0 --horizon_length=4 --seed=1 --debug=False --vision=False
```
For AntMaze Vision
```
python run_opal.py --save_dir=<save dir> --env_name=antmaze-large-diverse-v0 --horizon_length=4 --seed=1 --debug=False --vision=True
```
For Kitchen 
```
python run_opal.py --save_dir=<save_dir> --env_name=kitchen-mixed-v0 --horizon_length=4 --seed=1 --debug=False --vision=False
```

Ours, AntMaze
```
python train_finetuning_supe.py --eval_episodes=10 --max_steps=300000 --config.backup_entropy=False --config.num_min_qs=1 --project_name=antmaze-state --offline_relabel_type=min --use_rnd_offline=True --use_rnd_online=True --start_training=5000 --hpolicy_horizon=4 --utd_ratio=20 --env_name=antmaze-large-diverse-v2 --seed=1 --updates_per_step=4 --debug=False --load_dir=<checkpoint root path> --interpolate=False
```
Online w/ Traj Skills, AntMaze
```
python train_finetuning_supe.py --eval_episodes=10 --max_steps=300000 --config.backup_entropy=False --config.num_min_qs=1 --project_name=antmaze-state --offline_relabel_type=min --use_rnd_offline=False --use_rnd_online=True --start_training=5000 --hpolicy_horizon=4 --utd_ratio=20 --env_name=antmaze-large-diverse-v2 --seed=1 --updates_per_step=4 --debug=False --load_dir=<checkpoint root path> --interpolate=False --offline_ratio=0
```

ExPLORe, AntMaze
```
python train_finetuning_explore.py --eval_episodes=10 --max_steps=300000 --config.backup_entropy=False --config.num_min_qs=1 --project_name=antmaze-state --offline_relabel_type=min --use_rnd_offline=True --use_rnd_online=True --start_training=5000 --utd_ratio=20 --env_name=antmaze-large-diverse-v2 --seed=1 --rnd_config.coeff=2
```


Ours, Visual AntMaze
```
python train_finetuning_supe_pixels.py --eval_episodes=10 --max_steps=300000 --config.backup_entropy=False --config.num_min_qs=2 --project_name=antmaze-vision --offline_relabel_type=min --use_rnd_offline=True --use_rnd_online=True --start_training=5000 --hpolicy_horizon=4 --utd_ratio=20 --seed=1 --env_name=antmaze-large-diverse-v2 --offline_ratio=0.5 --updates_per_step=8 --config.num_qs=10 --use_icvf=True --interpolate=False
```
Online w/ Traj. Skills, Visual AntMaze
```
python train_finetuning_supe_pixels.py --eval_episodes=10 --max_steps=300000 --config.backup_entropy=False --config.num_min_qs=2 --project_name=antmaze-vision --offline_relabel_type=min --use_rnd_offline=False --use_rnd_online=True --start_training=5000 --hpolicy_horizon=4 --utd_ratio=20 --seed=1 --env_name=antmaze-large-diverse-v2 --offline_ratio=0 --updates_per_step=8 --config.num_qs=10 --use_icvf=True --interpolate=False
```
ExPLORe, Visual AntMaze 
```
python train_finetuning_explore_pixels.py --eval_episodes=10 --max_steps=300000 --config.backup_entropy=False --config.num_min_qs=1 --project_name=visual-antmaze --offline_relabel_type=min --use_rnd_offline=True --use_rnd_online=True --start_training=5000 --utd_ratio=20 --seed=1 --env_name=antmaze-large-diverse-v2 --offline_ratio=0.5 --updates_per_step=2 --config.num_qs=10 --use_icvf=True --rnd_config.coeff=2
```

Ours, Kitchen
```
python train_finetuning_supe.py --eval_episodes=10 --max_steps=300000 --config.backup_entropy=False --config.num_min_qs=2 --project_name=kitchen --offline_relabel_type=pred --use_rnd_offline=True --use_rnd_online=True --start_training=5000 --utd_ratio=20 --env_name=kitchen-mixed-v0 --seed=1 --hpolicy_horizon=4 --updates_per_step=4 --debug=False --load_dir=<checkpoint root path> --config.init_temperature=1.0 --rnd_config.coeff=8 --offline_ratio=0.5 --interpolate=False
```
Online w/ Traj. Skills, Kitchen
```
python train_finetuning_supe.py --eval_episodes=10 --max_steps=300000 --config.backup_entropy=False --config.num_min_qs=2 --project_name=kitchen --offline_relabel_type=pred --use_rnd_offline=False --use_rnd_online=True --start_training=5000 --utd_ratio=20 --env_name=kitchen-mixed-v0 --seed=1 --hpolicy_horizon=4 --updates_per_step=4 --debug=False --load_dir=<checkpoint root path> --config.init_temperature=1.0 --rnd_config.coeff=8 --offline_ratio=0 --interpolate=False
```
ExPLORe, Kitchen 
```
python train_finetuning_explore.py --eval_episodes=10 --max_steps=300000 --config.backup_entropy=False --config.num_min_qs=2 --project_name=kitchen --offline_relabel_type=pred --use_rnd_offline=True --use_rnd_online=True --start_training=5000 --utd_ratio=20 --env_name=kitchen-mixed-v0 --seed=1 --rnd_config.coeff=2.0 --config.init_temperature=1.0
```

For the additional goal locations, we use the following maze layouts:

## Medium
```
BIG_MAZE_TEST_CORNER_2 = [[1, 1, 1, 1, 1, 1, 1, 1],
                [1, R, 0, 1, 1, G, 0, 1],
                [1, 0, 0, 1, 0, 0, 0, 1],
                [1, 1, 0, 0, 0, 1, 1, 1],
                [1, 0, 0, 1, 0, 0, 0, 1],
                [1, 0, 1, 0, 0, 1, 0, 1],
                [1, 0, 0, 0, 1, 0, 0, 1],
                [1, 1, 1, 1, 1, 1, 1, 1]]

BIG_MAZE_TEST_CORNER_3 = [[1, 1, 1, 1, 1, 1, 1, 1],
                [1, R, 0, 1, 1, 0, 0, 1],
                [1, 0, 0, 1, 0, 0, 0, 1],
                [1, 1, 0, 0, 0, 1, 1, 1],
                [1, 0, 0, 1, 0, 0, 0, 1],
                [1, 0, 1, 0, 0, 1, 0, 1],
                [1, G, 0, 0, 1, 0, 0, 1],
                [1, 1, 1, 1, 1, 1, 1, 1]]

BIG_MAZE_TEST_CORNER_4 = [[1, 1, 1, 1, 1, 1, 1, 1],
                [1, R, 0, 1, 1, 0, 0, 1],
                [1, 0, 0, 1, 0, 0, 0, 1],
                [1, 1, 0, 0, 0, 1, 1, 1],
                [1, 0, 0, 1, 0, 0, 0, 1],
                [1, 0, 1, 0, 0, 1, 0, 1],
                [1, 0, 0, G, 1, 0, 0, 1],
                [1, 1, 1, 1, 1, 1, 1, 1]]
```

## Large
```
HARDEST_MAZE_TEST_CORNER_2 = [[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
                    [1, R, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1],
                    [1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1],
                    [1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1],
                    [1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1],
                    [1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1],
                    [1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1],
                    [1, 0, G, 1, 0, 0, 0, 1, 0, 0, 0, 1],
                    [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]]

HARDEST_MAZE_TEST_CORNER_3 = [[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
                    [1, R, 0, 0, 0, 1, 0, 0, 0, 0, G, 1],
                    [1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1],
                    [1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1],
                    [1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1],
                    [1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1],
                    [1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1],
                    [1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1],
                    [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]]

HARDEST_MAZE_TEST_CORNER_4 = [[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
                    [1, R, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1],
                    [1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1],
                    [1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1],
                    [1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1],
                    [1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1],
                    [1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1],
                    [1, 0, 0, 1, 0, 0, G, 1, 0, 0, 0, 1],
                    [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]]
```

## Ultra
```
ULTRA_MAZE_TEST_CORNER_2 = [[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
                  [1, R, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, G, 1],
                  [1, 0, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1],
                  [1, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1],
                  [1, 0, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1],
                  [1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1],
                  [1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1],
                  [1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0, 0, 1, 0, 1, 1],
                  [1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1],
                  [1, 1, 0, 1, 1, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1],
                  [1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1],
                  [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]]

ULTRA_MAZE_TEST_CORNER_3 = [[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
                  [1, R, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1],
                  [1, 0, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1],
                  [1, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1],
                  [1, 0, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1],
                  [1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1],
                  [1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1],
                  [1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0, 0, 1, 0, 1, 1],
                  [1, G, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1],
                  [1, 1, 0, 1, 1, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1],
                  [1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1],
                  [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]]

ULTRA_MAZE_TEST_CORNER_4 = [[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
                  [1, R, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1],
                  [1, 0, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1],
                  [1, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1],
                  [1, 0, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1],
                  [1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1],
                  [1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1],
                  [1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0, 0, 1, 0, 1, 1],
                  [1, 0, 0, 0, 0, 0, 0, G, 1, 0, 1, 0, 0, 0, 0, 1],
                  [1, 1, 0, 1, 1, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1],
                  [1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1],
                  [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]]
```
